# Medicaid MCO Procurement Claims Analysis Dataset

## Dataset Overview

This dataset contains extracted quantitative claims and partnership references from Medicaid managed care organization (MCO) procurement documents across 32 US states, spanning 2017-2024. The analysis supports the manuscript "Evaluating Medicaid Managed Care Organization Accountability: A Large-Scale Analysis of RFP Response Claims Across 32 US States."

## Citation

Basu S. Medicaid MCO Procurement Claims Analysis Dataset. Harvard Dataverse. 2024. https://doi.org/[DOI]

## Data Collection

- **Source**: Publicly available state Medicaid procurement documents (RFPs, proposals, contracts, scoring materials)
- **Time Period**: 2017-2024
- **Geographic Coverage**: 32 US states and District of Columbia
- **Collection Methods**: State procurement portals, Medicaid agency websites, FOIA requests

## Files Included

### Primary Data Files

| File | Description | Format | Records |
|------|-------------|--------|---------|
| `claims_extracted.csv` | All extracted quantitative claims | CSV | 55,914 |
| `partnerships_extracted.csv` | All extracted partnership references | CSV | 55,364 |
| `document_inventory.csv` | Source document catalog | CSV | 265 |

### Summary Files

| File | Description | Format |
|------|-------------|--------|
| `claim_summary.json` | Aggregate statistics for claims | JSON |
| `state_summary.csv` | Claims by state | CSV |
| `hedis_outcomes.csv` | NCQA HEDIS validation data | CSV |

### Documentation

| File | Description |
|------|-------------|
| `codebook.md` | Variable definitions and coding schemes |
| `extraction_methods.md` | Technical documentation for extraction pipeline |

## Variable Definitions

See `codebook.md` for complete variable definitions.

### Key Variables in claims_extracted.csv

- `claim_id`: Unique identifier
- `state`: US state of origin
- `document_type`: RFP, Proposal, Contract, etc.
- `claim_text`: Verbatim extracted text
- `pattern_type`: Classification (metric, improvement, change, target, rate)
- `numeric_value`: Extracted numeric value (if applicable)
- `year`: Procurement year

## Methodology

Claims were extracted using a hybrid approach:
1. **Pattern-based extraction**: Regular expressions identifying quantitative statements
2. **Quality metric detection**: HEDIS, CAHPS, and state-specific measure references
3. **Validation**: Manual review of random sample (n=200) with Cohen's kappa >0.85

## Limitations

- Document availability varies by state public records practices
- Some documents redacted or unavailable
- Pattern extraction may miss non-standard claim formats
- HEDIS data limited to 2023 measurement year

## Ethical Considerations

This dataset contains only publicly available government documents and aggregate quality data. No individually identifiable information is included.

## Funding

This research was funded by Waymark.

## Contact

Sanjay Basu, MD, PhD
Waymark
sanjay.basu@waymarkcare.org

## License

CC BY 4.0 - Creative Commons Attribution 4.0 International
